164 research outputs found
Imagination Based Sample Construction for Zero-Shot Learning
Zero-shot learning (ZSL) which aims to recognize unseen classes with no
labeled training sample, efficiently tackles the problem of missing labeled
data in image retrieval. Nowadays there are mainly two types of popular methods
for ZSL to recognize images of unseen classes: probabilistic reasoning and
feature projection. Different from these existing types of methods, we propose
a new method: sample construction to deal with the problem of ZSL. Our proposed
method, called Imagination Based Sample Construction (IBSC), innovatively
constructs image samples of target classes in feature space by mimicking human
associative cognition process. Based on an association between attribute and
feature, target samples are constructed from different parts of various
samples. Furthermore, dissimilarity representation is employed to select
high-quality constructed samples which are used as labeled data to train a
specific classifier for those unseen classes. In this way, zero-shot learning
is turned into a supervised learning problem. As far as we know, it is the
first work to construct samples for ZSL thus, our work is viewed as a baseline
for future sample construction methods. Experiments on four benchmark datasets
show the superiority of our proposed method.Comment: Accepted as a short paper in ACM SIGIR 201
TagBook: A Semantic Video Representation without Supervision for Event Detection
We consider the problem of event detection in video for scenarios where only
few, or even zero examples are available for training. For this challenging
setting, the prevailing solutions in the literature rely on a semantic video
representation obtained from thousands of pre-trained concept detectors.
Different from existing work, we propose a new semantic video representation
that is based on freely available social tagged videos only, without the need
for training any intermediate concept detectors. We introduce a simple
algorithm that propagates tags from a video's nearest neighbors, similar in
spirit to the ones used for image retrieval, but redesign it for video event
detection by including video source set refinement and varying the video tag
assignment. We call our approach TagBook and study its construction,
descriptiveness and detection performance on the TRECVID 2013 and 2014
multimedia event detection datasets and the Columbia Consumer Video dataset.
Despite its simple nature, the proposed TagBook video representation is
remarkably effective for few-example and zero-example event detection, even
outperforming very recent state-of-the-art alternatives building on supervised
representations.Comment: accepted for publication as a regular paper in the IEEE Transactions
on Multimedi
Adaptive Tag Selection for Image Annotation
Not all tags are relevant to an image, and the number of relevant tags is
image-dependent. Although many methods have been proposed for image
auto-annotation, the question of how to determine the number of tags to be
selected per image remains open. The main challenge is that for a large tag
vocabulary, there is often a lack of ground truth data for acquiring optimal
cutoff thresholds per tag. In contrast to previous works that pre-specify the
number of tags to be selected, we propose in this paper adaptive tag selection.
The key insight is to divide the vocabulary into two disjoint subsets, namely a
seen set consisting of tags having ground truth available for optimizing their
thresholds and a novel set consisting of tags without any ground truth. Such a
division allows us to estimate how many tags shall be selected from the novel
set according to the tags that have been selected from the seen set. The
effectiveness of the proposed method is justified by our participation in the
ImageCLEF 2014 image annotation task. On a set of 2,065 test images with ground
truth available for 207 tags, the benchmark evaluation shows that compared to
the popular top- strategy which obtains an F-score of 0.122, adaptive tag
selection achieves a higher F-score of 0.223. Moreover, by treating the
underlying image annotation system as a black box, the new method can be used
as an easy plug-in to boost the performance of existing systems
Co-Teaching for Unsupervised Domain Adaptation and Expansion
Unsupervised Domain Adaptation (UDA) is known to trade a model's performance
on a source domain for improving its performance on a target domain. To resolve
the issue, Unsupervised Domain Expansion (UDE) has been proposed recently to
adapt the model for the target domain as UDA does, and in the meantime maintain
its performance on the source domain. For both UDA and UDE, a model tailored to
a given domain, let it be the source or the target domain, is assumed to well
handle samples from the given domain. We question the assumption by reporting
the existence of cross-domain visual ambiguity: Due to the lack of a crystally
clear boundary between the two domains, samples from one domain can be visually
close to the other domain. We exploit this finding and accordingly propose in
this paper Co-Teaching (CT) that consists of knowledge distillation based CT
(kdCT) and mixup based CT (miCT). Specifically, kdCT transfers knowledge from a
leader-teacher network and an assistant-teacher network to a student network,
so the cross-domain visual ambiguity will be better handled by the student.
Meanwhile, miCT further enhances the generalization ability of the student.
Comprehensive experiments on two image-classification benchmarks and two
driving-scene-segmentation benchmarks justify the viability of the proposed
method
3D Object Detection for Autonomous Driving: A Survey
Autonomous driving is regarded as one of the most promising remedies to
shield human beings from severe crashes. To this end, 3D object detection
serves as the core basis of such perception system especially for the sake of
path planning, motion prediction, collision avoidance, etc. Generally, stereo
or monocular images with corresponding 3D point clouds are already standard
layout for 3D object detection, out of which point clouds are increasingly
prevalent with accurate depth information being provided. Despite existing
efforts, 3D object detection on point clouds is still in its infancy due to
high sparseness and irregularity of point clouds by nature, misalignment view
between camera view and LiDAR bird's eye of view for modality synergies,
occlusions and scale variations at long distances, etc. Recently, profound
progress has been made in 3D object detection, with a large body of literature
being investigated to address this vision task. As such, we present a
comprehensive review of the latest progress in this field covering all the main
topics including sensors, fundamentals, and the recent state-of-the-art
detection methods with their pros and cons. Furthermore, we introduce metrics
and provide quantitative comparisons on popular public datasets. The avenues
for future work are going to be judiciously identified after an in-deep
analysis of the surveyed works. Finally, we conclude this paper.Comment: 3D object detection, Autonomous driving, Point cloud
- …